-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve and unify ingestion server logging, simplify docker CMDs #2639
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested locally. The logs are much cleaner.
Do you know why we get an error
logs at the beginning? Here, notice the gunicorn.error
:
[2023-07-14 08:04:06,168 - gunicorn.error - 264][INFO] Starting gunicorn 20.1.0
[2023-07-14 08:04:06,169 - gunicorn.error - 267][DEBUG] Arbiter booted
[2023-07-14 08:04:06,169 - gunicorn.error - 264][INFO] Listening at: http://0.0.0.0:8002 (1)
[2023-07-14 08:04:06,169 - gunicorn.error - 264][INFO] Using worker: sync
[2023-07-14 08:04:06,171 - gunicorn.error - 264][INFO] Booting worker with pid: 7
[2023-07-14 08:04:06,260 - gunicorn.error - 267][DEBUG] 1 workers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@obulat I think that's just the logger that's delegated to log those specifics 🤷🏼♀️ Seems like a gunicorn internal thing, since the logging config is copied almost wholesale with only a few modifications. In short, no I'm not sure why! But it seems fine 🙂 |
Fixes
This PR is an effort towards addressing https://github.com/WordPress/openverse-infrastructure/issues/529, which seeks to handle multi-line log capturing in Cloudwatch. While working on that issue, I noticed that the ingestion server actually had two separate log formats: one coming from gunicorn and the other coming from the standard python in-process logging. Here's an example of them mixing in production:
While we could set up a matcher for these two different formats, Cloudwatch also supports using the timestamp parsing format as the multiline matcher, which seems ideal for us. The timestamp parser is very naive, so we needed a single format per service in order to handle this appropriately.
Description
This PR modifies the gunicorn configuration for both the ingestion server and the indexer worker to provide a single logging format for all messages emitted by the service while preserving the various log levels. As an added bonus, we also gain the
gunicorn.access
logs, which were previously not showing up for the ingestion server.Previously, we had separate configuration files for the ingestion server and the indexer worker. These files were nearly identical (all that differs now is the port binding and the actual WSGI application declaration). With the new logging config which was added, I decided that it'd be better to use the same config across both containers and only declare the differences in the actual command which runs the container. This prevents us having two verbose configurations which we would need to maintain two copies of.
Additionally, the docker-compose file within the ingestion server folder was removed. I believe this was intended to be used similar to the catalog's version (which was recently moved into the infra repo anyway). However we recreate the docker-compose file when deploying in the init file anyway, so this is no longer useful.
This will be followed up by a PR in the infrastructure repo with adjustments for deployment that are necessary given this new setup.
Logging comparison
Below are the logs for
just api/init
, before and after these changes. The important differences to note are that the log format is now all the same and we get a bit more information regarding access logs!Logs before
Logs after
Testing Instructions
Run
just build && just down -v
thenjust api/init
, followed withjust logs ingestion_server
. You should see the new log format in the produced logs!Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin